perm filename COG.ECO[AM,DBL]1 blob sn#472383 filedate 1979-09-07 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00024 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00004 00002	\input ijcai 	% Then CogSciJournal formatting macros
C00013 00003	\NSECP{Introduction}
C00018 00004	\NSECP{Dynamic Self-Modification}
C00028 00005	               \SSEC{Extending a Schematized Representation}
C00042 00006		\SSEC{CACHING to reclaim lost efficiency}
C00056 00007		      \SSSEC{The Spectrum of Data Reductions}
C00062 00008	      \SSEC{Contrasts with Psychological Ideas of Economy}
C00067 00009	\NSECP{When to Define a New Slot}
C00082 00010	\NSECP{Conclusions}
C00095 00011		\SSEC{The Need for Further Research}
C00099 00012	\vskip 1xgpin 	% acknowledgements
C00100 00013	\ASEC{Cognitive Economy in Existing Programs}
C00117 00014	\ASEC{The  Assumptions}
C00121 00015	\ASSEC{A Model of Intelligence}
C00127 00016	   \ASSEC{A Model of Intelligent Program Organization}
C00132 00017	   \ASSEC{A Model of (Present) Computers}
C00135 00018	\ASEC{Another Example of GET}
C00137 00019	\ASEC{Dynamically Monitoring the Task Environment}
C00153 00020	\ASEC{Cognitive Economy in ANY Environment}
C00155 00021		\ASSEC{Expectation-Simplified Processing}
C00167 00022		\ASSEC{Levels of Abstraction}
C00175 00023	\NNSECP{References}
C00187 00024	\vfill\end
C00196 ENDMK
C⊗;
\input ijcai 	% Then CogSciJournal formatting macros
\def\TITL #1{
 \titlepage\ninepoint
 \runninglefthead{#1}
 \vfill
 \advcount8
 \runningrighthead{{ }} 
 \section{{ }}
 \eject
 \ctrline{\:=#1}
	\vskip 4pt plus 2pt
	\acpmark{\chd}{\csec}
	\noindent\ninepoint\!
}
\def\sectionskip{\penalty-60\vskip 3pt plus 2pt minus 1pt}
\def\ASEC #1{
 \titlepage\ninepoint
 \vfill\eject
 \advcount4
 \gdef\grrh{#1}
 \runningrighthead{#1}
 \section{App. \count4 }
 \sectionskip
 \asecbegin{\count4. #1}
 \setcount5 0
% \setcount9 0
 }
\def\NSECP #1{
 \titlepage\ninepoint
 \vfill\eject
 \advcount4
 \gdef\grrh{#1}
 \runningrighthead{#1}
 \section{\count4}
 \sectionskip
 \sectionbegin{\count4. #1}
 \setcount5 0
%	 \setcount9 0      FOR COG SCI, DON'T RESTART FOOTNOTE NUMBERS
 }
\hsize 6xgpin \vsize 8xgpin \maxdepth 2pt \parindent 19pt \topbaseline 10pt
\parskip 20pt plus 3pt minus 1pt \lineskip 8pt
\topskip 27pt plus 5pt minus 2pt  \botskip 3pt plus 9pt
\output{\baselineskip 0pt\lineskip0pt	% beginning of output routine, resets skips
	\vjust to 9.5xgpin{         % prepare the full page of this fixed height
 		\vskip 27pt	% no page nos at top of preface material
		\page 		% insert the page contents
		\vfill		 % extra space before the page number
	}			% completion of the \vjust
	\advcount0}		% increase page number by 1 and end output routine
\jpar 100

\TITL{COGNITIVE  ECONOMY}

%  \vskip 9pt

 \ctrline{\:=In a Fluid Task Environment}

 \vskip 17pt


{\:q Douglas B. Lenat, \  Frederick Hayes-Roth, \  and Philip Klahr}


\vskip 1.5xgpin   	% Footnote material for title page


\eightpoint

\parindent 0pt

Dr. Lenat is an assistant professor in
the Computer Science Department, Stanford University,
Stanford, Ca. 94305.
Drs. Hayes-Roth and Klahr are researchers in the Information Sciences
Department of the Rand Corporation, Santa Monica, Ca.

This paper describes work in progress, sponsored by the National
Science Foundation under grants MCS77-04440 and MCS77-03273.  It is
addressed to a technical audience familiar with the concepts of artificial
intelligence, knowledge representation, and knowledge engineering.  
Although it does present some concrete research results,
it is
being disseminated at this time primarily to stimulate discussion and interaction
with colleagues on these issues.

\ninepoint

\vfill

Running Title:  COGNITIVE ECONOMY


\setcount0 1

\vfill\eject

\output{\baselineskip 0pt\lineskip0pt	% beginning of output routine, resets skips
	\vjust to 8.5xgpin{         % prepare the full page of this fixed height
 		\ctrline{\:c --- \count0 --- }
		\vskip 20pt
		\page 		% insert the page contents
		\vfill		 % extra space before the page number
	}			% completion of the \vjust
	\advcount0}		% increase page number by 1 and end output routine

\tenpoint

{\bf Abstract}

\ninepoint

\vskip 1xgpin  		% Here goes the abstract

\noindent Intelligent systems can explore only tiny subsets of their potential
external and conceptual worlds.  To increase their effective
capacities, they must develop efficient forms of representation,
access, and operation.  
If forced to survive in a changing task environment,
most inferential systems could benefit
from
{\it learning}
about that environment and their own behavior.  For example,
they can exploit new schemata or different slot-names to simplify and
restructure their knowledge.  
In this paper we describe a program that automatically extends its
schematized representation of knowledge, by defining appropriate new slots
dynamically.  We then review some very general techniques for
increasing the efficiency of programs without sacrificing
expressibility:
caching, abstraction, and
expectation-simplified processing.
It is shown in detail how one of these, caching, regains the efficiency that
otherwise would be lost in adopting the interpretive slot-defining scheme
presented earlier.  

\vskip 11pt

\tenpoint

\vfill

\eject

\NSECP{Introduction}

\parindent 19pt

Computer programs, no less than biological creatures, must perform 
in an environment: an externally imposed set of demands, pressures,
opportunities, regularities.
{\it Cognitive economy} is the degree to which a program is adapted to its
environment, the extent to which
its internal capabilities (structures and processes) accurately
and efficiently reflect
its environmental niche.
If the environment changes frequently and radically, then the program
(or species or organization) should monitor those changes and be capable of
adapting itself to them dynamically.  This is cost effective when
the penalty for failure to adapt exceeds the sum of the following
costs: (i) The continuous overhead for monitoring to detect changes,
(ii) The onetime cost of providing a mechanism by which self-modification
can occur, and (iii) The occasional expense of employing that mechanism to
respond to a change which has been observed.  

\vskip 1.5xgpin

% 	Put the equation here -- in a box, with (1) at margin


There are two directions for exploring this phenomenon:

\noindent $\bullet $ 
Make the preceding equation less mushy.  Investigate what features of the
performer (program), task environment, changes in that environment, etc.,
make the ``monitoring & self-modification'' process cost effective.
Clearly, in some cases the penalty
for non-adaptation {\it is} high enough; 
as examples spanning several time scales
consider biological evolution (millenia),
the educational system of a culture (years),
the business meetings of a corporation (months),
the immune system of an organism (hours),
and
the nervous system of an organism (milliseconds).
In other cases, the above equation tips {\it against} adaptation as being
cognitively economical; 
to date, almost all computer programs (and cognitive models in general)
have been constructed to deal with a fixed task, or at least with a
fixed task environment [Newell & Simon 1972].  But the magnitude of the
phenomena we seek to model continues to grow, and we begin to
feel the confines of any single, unchanging model we hypothesize.
In the years ahead, our models --- be they in Lisp, equations, or prose ---
must become increasingly responsive.   In the conclusions of this paper,
we provide some further thoughts on this research direction.

\noindent $\bullet $ 
Operationalize the vague phrase ``monitoring & self-modification.''
In the case of computer models, how might they select (or {\sl discover})
timely new knowledge, new control algorithms,
new knowledge representations?   What are the difficulties encountered?
In this paper we present some
specific techniques by which such programs can monitor their
runtime environment (Appendix 4) 
and modify themselves to heighten their fitness to it (Section 2).
We describe how one particular program, Eurisko, dynamically defines useful
new slots automatically.

\NSECP{Dynamic Self-Modification}


\yskip


{\bf Summary:\  \sl
We illlustrate various ways in which a program might use to
advantage knowledge gleaned dynamically: selecting (or perhaps even {\it 
discovering}) new data structures, representations, and algorithms, which
seem better suited to the current runtime environment, the current user, 
the current problem, etc.}

\yskip

In order for a program to dynamically modify its own
processing, it must examine, integrate and apply knowledge of
the task domain, knowledge about programming in general, knowledge about the
actual runtime environment, and knowledge gathered from observing its own
processing behavior.  Appendix 4 sketches ways in which dynamic program
performance could be self-monitored.  This section draws attention to the
possibility of applying such usage data to modifying --- and in the extreme
re-synthesizing --- the program.

Tasks can be specified anywhere from a pure ``What'' (desired
input/output behavior) to a pure ``How'' (precise algorithms, data structures,
representations, formats).  It has been observed [Feigenbaum 1977] that one
of the goals of AI research is to push our ability from the How end toward
the What end of that spectrum.  This often takes the form of designing a new
very$↑{n+1}$-high level language; i.e., higher
than any that hitherto existed, one which takes over
more of the program implementation.  We can see this as far back as
assemblers replacing machine language, and as recently as KRL [Bobrow and
Winograd 1977] and the Molgen Units Package [Stefik 1978]  extending
Interlisp. 

Assuming, then, that the user
desires merely to specify What, an intelligent language (automatic code
synthesizer) must decide How.  Upon what can it base its design?
Previous efforts have employed knowledge about program transformations
[Burstall and Darlington 1977] [Balzer et al. 1977], programming techniques and
task-specific knowledge [Green and Barstow 1978], and
 --- if the input language is at all complex --- information to enable
the succesful translation of the input specification into a sufficiently
precise specification of What [Lenat 1975].
A couple recent efforts have begun to guide the decision-making during code
synthesis by algorithmic analysis [Kant 1977] or even by aesthetics
[Cohen 1979][Kahn 1979]
[Lenat 1976].  But all these methods are
{\it static}: they feed off a fixed base of initially-supplied knowledge,
heuristics, constraints, suggestions, facts.  The user supplies a
specification for the desired program, and the knowledge base is employed
to produce that target.


What might it mean for a program to modify itself dynamically?
The simplest kind of adaptive abilities along this line would be to {\it select}
a data structure, a representation, or an algorithm from a set of well-known
choices
--- and to reserve the right and the ability to change that decision later.
One could gather knowledge about the relative strengths and weaknesses 
of each choice [Kant 1977].  This knowledge
could be in the form of rules which ask not only about static data, but which
sample the runtime environment of the program as well. See Figure 1.

\vskip 10pt

\noindent ``If
one conjunct appears to be false more often than any other, make it the first
one tested in the conjunction''  

\noindent ``If each person will use the program only
once, briefly, it is not worth building a model of each such user''

\noindent ``A function that is used frequently and changed rarely should be
compiled.''

\noindent ``If items always appear to be accessed in the same order, Then
a sequential data structure is probably adequate for them.''


\vskip 10pt

\ctrline{\bf [FIGURE 1: Environment-driven selection rules]}

\vskip 15pt

Such design questions could in most cases
be answered initially (during program synthesis) but
our suggestion of automating the gathering of such data is a further step
toward the ``What" of artificial intelligence.

There is another possibility, much more exotic than the previous one,
which deserves attention: the automated
discovery --- in ``real time" --- of new
types of data structures and representations which would be useful
for the program in its current environment.

Defining a new data structure is not as difficult as it may first appear:
a data structure can be thought of abstractly in terms of a set of operations
one wants to be able to perform on it (e.g., First, Last, Next, and Previous, for
lists).  The set of desired operations need not be minimal nor concern itself
with which operations are most important, etc.  The run-time usage of such
a data structure can be sampled, and that can guide the search for an
ever more appropriate implementation (e.g., in one usage, it might be
very useful for Last to be efficient, and Interlisp's {\:cTCONC}ing might be
chosen as an implementation; in another
environment, it might be useful for Next and Previous to be efficient, and
a doubly-linked implementation would be best).

The two points here are (i) the implementation of a data structure
may depend upon how it is commonly used in a particular spot in a program,
by a particular user, and (ii) a new kind of data structure may be defined
abstractly merely by selecting a set of operations; this choice, too, can
be made by examining the run-time data as it accumulates (e.g., the final
rule in Fig. 1.)

Defining a new {\it representation} is not quite so neat. 
In fact, humans have only
managed to find a handful of representations to date. In lieu of any constructive
suggestions along that line, we shall focus on {\it extending} a 
given representation:
This appears to be intermediate in difficulty between {\it selecting} a representation and
{\it discovering} a new one.

               \SSEC{Extending a Schematized Representation}


{\bf Summary:\  \sl
The Eurisko program extends it schematized representation
by defining new types of
slots. It uses a very simple grammar for defining new slots from old ones (legal
moves), and a corpus of heuristic rules for
guiding and constraining that process (plausible moves).}

\yskip

For a schematized representation,
``extension" could
occur by defining
new types of slots.  The Eurisko program (an extension of the AM 
program [Lenat 1976]) has this capability, and we shall
briefly describe how this mechanism was developed.

First, we isolated several ways in which new slots are defined from old
ones:

\yskip

\parskip 12pt plus 1pt minus 1pt \lineskip 5pt

\noindent {\bf STAR}  (e.g., Ancestor $=↓{df}$ Parent$↑*$ which means a Parent of a
Parent of... of a Parent of me; the general case should be aparent);

\noindent {\bf UNION} (e.g., Parent $=↓{df}$ Father $\union$ Mother); 

\noindent {\bf COMPOSITION}
(e.g., First-Cousin $=↓{df}$ Child$\circ$Sibling$\circ$Parent; i.e., any
child of any sibling of either of my parents); 

\noindent {\bf DIFFERENCE}
(e.g., Remote-ancestor $=↓{df}$ Ancestor $-$ Parent).

\parskip 20pt plus 3pt minus 1pt \lineskip 8pt

\yskip

These four operators which define new types of slots 
($\,↑*,\  \union,\ \circ,\ -$) are called {\it slot-combiners}.
Other slot-combiners will be mentioned later.

\yskip

Next, we added to Eurisko's knowledge base a concept for each type of slot, and
a concept for each type of slot-combiner. Figure 2 shows some of the
Generalizations and Star concepts. Note in particular the way in which
Generalizations is defined as Genl$↑*$ --- i.e., immediate generalizations,
{\it their} immediate generalizations, and so on. 

\vskip 3.8xgpin

\ctrline{\bf[FIGURE 2: Generalizations and Star concepts]}

% \yyskip

% \yyskip

Finally, we modified the way in which slot entries are accessed.
To illustrate this, we choose a simple task in mathematics, whose paraphrase
is, ``What are all the generalizations of the concept `primes'?"  The
traditional way in which {\:c (GET PRIMES GENERALIZATIONS)} would work is
to examine the property list of {\:c PRIMES}, looking for the attribute
{\:c GENERALIZATIONS}; if found, the entries listed there would be returned.
If, as in Fig. 3, there is no such attribute, the call upon {\:c GET} will
return {\:c NIL} (the empty list).

\vfill\eject

.

\vskip 3.0xgpin


\ctrline{\bf[FIGURE 3: Primes concept]}

\vskip 3.0xgpin

\ctrline{\bf[FIGURE 4: network of concepts from Primes up]}

\yyskip

We modified the
way in which any retrieval request of the form 
{\:c (GET C F)}
operates. In case the 
{\:c F}
 attribute of
{\:c C}
 has no entries (or doesn't exist),  Eurisko examines the definition of
{\:c  F}
and --- if one exists --- try to use it to compute the entries that could
legally be stored on the 
{\:c F}
 attribute of 
{\:c C}.  More precisely, before quitting
we try to 
{\:c (GET F DEFN),}
 and if that succeeds we apply it to 
{\:c C.}
Let's continue the example of 
{\:c (GET PRIMES GENERALIZATIONS).}
 As we
see in Fig. 3 above, there are none recorded. So 
{\:c GET}
 now calls itself
recursively; our original call is replaced by
{\:c ((GET GENERALIZATIONS DEFN)  PRIMES).}
But as Fig. 2 shows, there is no slot labelled 
{\:c DEFN}
 on the concept for
{\:c GENERALIZATIONS}.
So we recur one more time. By now our call has become

{\:c (((GET DEFN DEFN) GENERALIZATIONS) PRIMES).}\foo{I.e.,
get the expression stored in the Defn slot of the Defn concept; call it F.
F should exist, and should be a $\lambda $-expression (a function)
which takes the name of a slot S as argument, and computes a
function which tells how to read slot S for any concept.  Eurisko
applies F to Generalizations, and that yields as a result a new function,
a $\lambda $-expression, call it G. G tells how to find generalizations
of any concept.  In particular, when Eurisko applies G to Primes, the result
is a list of the legal entries for the Generalizations slot of Primes.
Quote marks (') have been omitted for readability.}

Here is part of the 
{\:c DEFN}
 concept:

\vfill\eject

.

\vskip 2.0xgpin

\ctrline{\bf[FIGURE 5: DEFN concept]}

\yyskip


Luckily, it does have a 
{\:c DEFN}
 slot, so we end the recursion.
Applying the entry stored there to the argument ``{\:c GENERALIZATIONS},"
we see our original call becoming transformed into

{\:c [((GET (GET GENERALIZATIONS SLOT-COMBINER) DEFN)}

{\:c \hjust{\hskip 29pt (GET GENERALIZATIONS BUILT-FROM))}}

{\:c \hjust{\hskip 23pt PRIMES]}}

We see from Fig. 2 that
the slot-combiner of Generalizations is ``Star,"
and the argument
(old slot) which it is built from is ``Genl."  So the entire call
shrinks into
{\:c (((GET STAR DEFN) GENL) PRIMES).}
The Star concept has an entry for its Defn slot;
it turns the preceding call into
{\:c (($\lambda$ (c) (AND c (CONS c (self (GET c GENL)))))   PRIMES). }
This function first calls for 
{\:c (GET PRIMES GENL),}
 which is 
{\:c NUMBERS,}
 then calls
itself on 
{\:c NUMBERS;}
 that in turn calls for 
{\:c (GET NUMBERS GENL),}
 which is
{\:c OBJECTS,}
 and calls itself recursively on 
{\:c OBJECTS;}
 that calls for
{\:c (GET OBJECTS GENL),}
 which is 
{\:c ANYTHING,}
 and the next recursive call terminates
when it is discovered that 
{\:c ANYTHING}
 has no 
{\:c GENL}
 (no immediate generalization.)  See Fig. 4.
The list 
{\:c CONS}tructed and returned is thus 
{\:c (PRIMES NUMBERS OBJECTS ANYTHING).}
These four items {\it are} the legal entries for the 
{\:c GENERALIZATIONS}
 slot of
{\:c PRIMES, }
according to the distributed definition of 
{\:c GENERALIZATIONS} as {\:c GENL}$↑*$.

Notationally there is no distinction between slots which are ``primitive"
(values actually stored as attributes on a property list) and slots which
are ``virtual" (values must be computed using the slot's definition).
A heuristic might refer to the Generalizations of Primes without knowing,
or caring, whether that initiated a single access or a dizzy chase.
Thus we are decoupling, disentangling knowledge (the rules) from the
specific implementation of its representation (the choice of which slots
are primitive and which are virtual.) 
This is required because knowledge remains invariant under changes in
the environment which effect the worth of various representation schemes.
Michie recognized this long ago with his memo functions.

To define a new kind of slot, now, Eurisko merely specifies one of the
then-known
slot-combiners  and  a list of the old 
pre-existing slots from which it is built.
Thus we might define a new slot, by creating a new concept (calling it,
say, 
``{\:c DG}"), filling its
Slot-combiner slot with the entry ``Difference", filling its 
``Built-from" slot with the arguments ``Generalizations Genl."  This would
be a new kind of slot, one which returned all generalizations of a concept
except its immediate generalizations; the call 
{\:c (GET PRIMES DG)}
 would return
{\:c (PRIMES OBJECTS ANYTHING).   }

With no extension at all, Eurisko was then able to define new
kinds of slot-combiners. For instance, it proposed one which took
two old slotnames as arguments, f and g, and defined a new slot which was
f$↑*\circ\,$g$\,\circ\,$f$↑*$. This could be extremely useful (e.g., in database
searches: see Fiksel and Bower [1976]).  In particular the crucial
slot ``Examples" is defined as Spec$↑*\circ\,$Immed-Exs$\,\circ\,$Spec$↑*$.
{\:c PLUS} (in the sense of regular expressions, not arithmetic) is defined by
Eurisko as the difference between f$↑*$ and f.  Genl$↑+$ is found to
be more worthwhile than Genl$↑*$, and {\:c PLUS} itself is ultimately favored
over {\:c STAR.}

	\SSEC{CACHING to reclaim lost efficiency}

{\bf Summary:\  \sl 
``Caching'' the results of computations can dramatically improve the
performance of many programs.  Reasoning can be brought to bear to decide
whether to cache, if so what to remember, and (later) whether or not to ignore 
the cached value and recompute it.  Caching reclaims the efficiency that could
otherwise be sacrificed by the highly indirect representation-interpreting
scheme described in the previous section.
}


\yskip

The reader may object that Eurisko's scheme for indirectly interpreting the
definition of each slot is a horribly inefficient way to operate, since
the slots' definitions change so rarely. 
It seems absurd to squander cpu time recomputing {\:c (GET PRIMES
GENERALIZATIONS)} each time.  After at most one or two repetitions,
we would want our program to have ``learned'' an efficient way to get an
answer.

Indeed, a general tool for
cognitive economy
 --- software caching --- 
can be exploited to reclaim most of this efficiency.
By a  ``general" tool, we mean one which is applicable in almost all
environments, not merely those being focused upon in this paper (fluid ones.)

Immediately upon computing (GET c f), Eurisko store the computed value {\it v}
in the f slot of concept c ({\it a la} memo functions.)
Thus, the next time (GET c f) is called, it will
find {\it v} and return with it immediately.  After
(GET PRIMES GENERALIZATIONS) was called, the list (PRIMES NUMBERS OBJECTS
ANYTHING) was stored on the GENERALIZATIONS slot of PRIMES.  Recall that 
Eurisko had to recursively call (GET GENERALIZATIONS DEFN), so the
$\lambda$-expression {\it that} call returned would be stored on the
DEFN slot of the concept called GENERALIZATIONS.  When we later call
(GET DUCK GENERALIZATIONS), that call will run much faster, since the
definition of the Generalizations slot has already been computed and
stored away in the ``proper place.''
The definitions of
slots are very slowly --- if ever --- changing, hence the recomputation of
Defn of Generalizations is quite a rare event.  Caching that value must be
cost-effective in the long run.

In general, we see that caching a value for slot F of concept C is
most applicable when the value can be expected to vary quite infrequently.
In Eurisko, this gives us the flexibility to redefine a slot's definition
if we wish, but (since this change of representation
will be rare) the program will run just about as
efficiently as if that capability
were absent.  This is analogous to compiling: so long as the definition of
a function doesn't change too often, it's undisputedly worthwhile to compile
it.   The caching of 
{\:c DEFN}
 and
{\:c  GENERALIZATION}
 slots is not in any way special;
the value of any virtual slot can be cached.


We saw earlier that notationally 
there was no need
to distinguish an access of a primitively-stored slot from an access of a
virtual, computed slot.  E.g., a heuristic could refer to ``Generalizations
of Primes'' without knowing or caring whether that initiated a single access
or a long rippling search.
Now we see that once the value is computed and
cached away, there may be no telling it from a primitive one.
Thus, with but a small one-time cost, our program runs as quickly as if
all slots were primitive.  The inefficiency introduced by the
interpretive slot-combiner scheme has been eliminated.

The values that Eurisko caches away may become stale, incorrect after a
a while.  When doing a GET, therefore, when a cached value is encountered,
Eurisko must decide whether to accept it,
or else to ignore it, recompute it, and store the new value there.
One possible answer to this ``updating problem'' is the following
conservative one: whenever a value for a slot of a concept changes, discard any
cached values that might {\it possibly} have depended on it.
Arbitrary amounts of reasoning may be brought to bear to decide which
values needn't be discarded to maintain consistency of the knowledge base,
but of course if too mch time is spent in this pursuit the efficiency
advantages of caching disappear.  Eurisko carries on AM's tradition of
living with contradictions, with a plausible but not ``correct'' knowledge
base.  Namely, it has some heuristics (see Fig. 7)
for deciding whether to ignore or
accept any cache it encounters during a GET operation.

We added
some extra arguments to 
{\:c GET,}
 parameters which describe a
cost/benefit curve:
for each amount of resources expended and each
degree of precision of the result, this curve specifies precisely how
acceptable that resource consumption / accuracy of
solution pair is.   One extreme of this is to provide
an ironclad limit on resources
and say ``Do the best you can
within these time/space/... limits."  Another extreme
(most common in present-day programs) is to specify
an ironclad goal criterion (``Find an answer which is at
least {\sl this} good, no matter how long it takes to get.")
We are calling attention to the space in between
the two extremes.

To bring us back from the lofty heights of generality, let's see in more
detail how we got Eurisko to do this. 
First, we linearized this space:  we picked a
few significant parameters of it, and defined a description to be merely
a list of values for these parameters.
When a call on 
{\:c GET}
 was executed, and a cached value was encountered, a
formula relating these extra arguments to 
{\:c GET}
 would then determine whether
to accept that cache, or to ignore it and recompute the value.
Eurisko's parameters (extra arguments to 
{\:c GET}) are:
cpu time to expend, space to expend, whether the user can be asked about
this, how fresh the cache must be, minimum amount of resources which must have
been expended at the time the cache was written.

So
{\:c  (GET PRIMES GENERALIZATIONS 200 40 NIL 3 0)}
 would allow any cached
value to be accepted if it appeared that recomputing would take more than
200 milliseconds, or would use up more than 40 list cells, or if the value
had been put there less than three Cycles ago.\foo{In Eurisko, a ``Cycle'' is the
time it takes to work on the top-rated task on the top-rated agenda.}
Otherwise, the cache would be ignored, a newer value would be computed, and
it would be stored away.  With it would also be recorded the following information:
(i) the fact that it was
a cached value, not a primitive one, (ii) how long it took to compute,
(iii) how many list cells it used up in computing this value, (iv) the current
Cycle (number of tasks worked on so far). The above call on 
{\:c GET}
 might result
in the following value being stored on the 
{\:c GENERALIZATIONS}
 slot of 
{\:c PRIMES:}

{\:c (*cache* \  (PRIMES NUMBERS OBJECTS ANYTHING)  54  6  9).}


	      \SSSEC{Storing and Updating Cached Values}

The caching process involves storage and updating. For both of
these aspects, we can discuss details of when, why, how, and what.
The decisions that arise are made with the guidance of heuristic rules,
many of which are illustrated below in
Figs. 6 and 7.
We have omitted many of the
very general ``common-sense" heuristics, and those which deal with details
of Lisp programming. We also have specified the rules in English, rather than
in Lisp; we have freely used expressions like ``recently", although in any
encoding that would have to be made much more precise.

Finally, note that these heuristics are not {\it always} relevant; they are
applicable for many AI programs, running on 
uniprocessing
PDP-10-sized computers, running
Interlisp, with a human user watching and interacting.
There would be other rules for other machine architectures and languages.
In other words, one could say that we have only partially specified the
IF-part (left hand sides) of the following heuristic rules. An alternative
would be to build up an entire knowledge base of heuristics just to determine
the applicability, the domains of relevance, 
of the following (as a function of machine and language.)


\vfill\eject   % Because it takes 2 pages

.

\vfill

\vskip 5xgpin

\ctrline{\bf[FIGURE 6: PRINCIPLES FOR STORING CACHES]}

\yskip

\eject

.

\vfill

\vskip 5xgpin

\ctrline{\bf[FIGURE 6 (continued): PRINCIPLES FOR STORING CACHES]}

\yskip

\eject

.

\vfill

\vskip 5xgpin

\ctrline{\bf[FIGURE 7: PRINCIPLES FOR UPDATING CACHES]}

\yskip

\eject  % Just takes 1 page

	      \SSSEC{The Spectrum of Data Reductions}


Caching, as we have described it, is merely the lowest-level of a tier of
data reduction processes, techniques for compiling hindsight into usable,
{\it more efficient}, more procedural forms.

We often refer differently to ``caching''
depending upon what it is that's being retained:
open-ended, inductive searches can be condensed in hindsight (i.e. cached)
into heuristics, deductive searches can be cached into much less branchy
algorithms,
subroutines can be cached into tables of frequently called arguments and
resultant values, and variable quantities have their value cached simply by
the process of binding.
To see this, consider the progression of ever more sophisticated tasks
which programs perform [see Fig. 8]: accessing a datum, calculating a
value from a fast algorithm, deducing a result by searching for a solution
or proof in a well-defined space, 
inducing a result by a very open-ended search process.
Often there is the opportunity to take experiences at one of these levels of
sophistication and abstract them into an entity which is one level lower,
yet which appears to contain almost the same power. 

\vskip 2.7xgpin

\ctrline{\bf[FIGURE 8: Progression of caching]}

\yskip

Consider first the way in which we reduce induction. After
much experience with inducing, we produce a {\it heuristic rule} which
enables the same inferencing to be done in a much more controlled space,
in a much more deductive way.  Heuristics reduce discovery tasks to mere
(!) symbolic manipulation within a well-defined system (as in AM [Lenat 1976]).

Deduction is reduced similarly. By many painful deductions, we gain know\-ledge
of the space we are searching, e.g. by deriving many theorems about it.
Those theorems can then be used to vastly simplify our search. Ultimately,
the process becomes efficient calculation. A short, optimal algorithm may
be found, e.g., which replaces a search for a solution (this has recently
occurred with the ``{\sl n queens}'' chess problem.)

Now we see that our caching process is the next step in this progression.
It reduces a calculation (e.g., to traverse a particular access path to
gather all Ancestors of a node) by a lookup, by a single access of a datum
in memory.

All the data reduction processes in the progression are environment-independent
techniques for increasing cognitive economy.  Often, an apparent loss due
to a specific expressive feature can be rectified by applying some of
these kinds of methods.  In our case, we opted for interpretive defining
of slots, but regained efficiency by caching those slot definitions.

%also variable binding fits in?
%ACCESS==> CALCULATE ==> DEDUCE ==> INDUCE BY OPEN=ENDED SEARCH
%
%   \       /      \       /  \      /
%
%     cache          thm, alg   heur rule
%
%[[note: saving multiple
%instantiations of the same rule at different levels of abstraction is
%NOT worth mentioning, I think, since it is a STATIC (eg planning) tactic;
%I may be wrong, and perhaps we should at least mention this v. briefly anyway]]
%
%[[Should we metnion explcitly the loss that may occur if the task involves
%a lot of reasoning one's
%way through delicate inference chains efficiently; or: a lot of
%synthetic reasoning or rule interaction?]]
%

      \SSEC{Contrasts with Psychological Ideas of Economy}

    When semantic networks first appeared in cognitive psychology, they
led to the idea that function should follow form.  In particular,
since taxonomic knowledge structures were hierarchical, inference
processes would presumably follow the same inter-category network paths
each time the corresponding relation needed testing.  It was for this
reason that Collins & Quillian [1969][1972] conjectured
that the time required to verify semantic relations would correlate with
distance in a corresponding ``economical" semantic network.  Thus, they
showed that people ordinarily verified ``a canary is a bird" faster than
they could verify either of the two more remote relationships,
``a canary is an animal" or ``a canary has wings."

    Subsequent research however showed that the needs of
particular retrieval experiences,
not {\sl a priori} categorizations, determined which inference paths humans would
follow and the associated response times.
A simple counterexample to the presumed hierarchical
retrieval schemes was found by Rips, Shoben and Smith [1973].  They showed
that people access a familiar  but so-called ``indirect"
relation such as ``a dog is an animal"
much faster than the supposedly immediate but less frequent
relation ``a dog is a mammal."
Using experimental materials and methods, Hayes-Roth & Hayes-Roth [1975]
found that human memory seemed plastic and adaptive in many of the same ways
we have proposed for intelligent machine programs.  Their results led them
to ``an adaptive model of memory, in which all learned relations are
represented directly, with strength proportional to experience.  There is
also good evidence for redundancy in the network, with multiple routes
connecting nodes representing particular pairs of concepts" [1975, p. 519].

    In another study, Hayes-Roth and Hayes-Roth (1977) found some evidence
that people understand and remember English texts directly in terms
of high-level lexical concepts, rather than 
exclusively as low-level or primitive
concepts.  They argued that cognitive processes would presumably
benefit if they could build directly upon the words or other meaningful
units of familiar language.  Any system that interpreted and
stored data only in terms of primitive units, as is often assumed in
language understanding theories, would encounter many slower and costlier
searches.  Both the empirical evidence and theoretical principles support
the idea that human language understanding exploits many kinds of caching.

    In short, we have chosen the term ``caching" to refer to these types of
redundant storage.  Caching, to capture and exploit repetition and
regularity, will benefit both humans and machines of limited processing
capabilities.


\NSECP{When to Define a New Slot}

We have discussed
how a new slot {\it can} be defined; consider now how
Eurisko decides when and how   to define a new one.
Our basic reply to {\it When?}
is that a slot is defined only when a heuristic rule
explicitly suggest it as being useful; that is, heuristics are used as
plausible move generators (as opposed to their typical use [Nilsson 1972]
as implausible move pruners).  Our basic reply to {\it How?} is that the
heuristic which suggests a new slot must provide (constraints for) its definition.

This merely pushes the problem back one layer, namely to the way in which the
heuristics know when and how to define new slots.  But now we are talking the
language of AM and Eurisko, and we can provide examples and general characterizations
of the slot-defining heuristics used in Eurisko.  Since each type of slot is
represented internally as a full-fledged (schematized) concept,
the new slots defined can eventually be evaluated (and changed or
discarded if they are not utile).  Each heuristic is also represented as a full-fledged concept,
and {\it its} worth can be determined 
based on the worths of the slots it suggests; thus each slot-suggesting heuristic
is occasionally modified (or even discarded).  Let's look at some cases:

Sometimes the
new type of slot is first proposed purely for aesthetic, analogy-completing, or
exploratory reasons. One such heuristic is:

\noindent ``If R and S are partial inverses of each other, then it's plausible
to define and investigate the following: 
R$\circ$S, S$\circ$R.''
% R$\circ$S -- $i$, S$\circ$R -- $i$  (where $i$ is the identity mapping,) 

There are several noneffective slots provided for the above heuristic in
Eurisko, such as its reason (``aesthetic''), its average running time,
which other rule(s) created {\it it} originally, instances of its use, etc.

Note that the heuristic
applies when R and S are mathematical functions, heuristics, or
slot definitions.  In the latter case, which is of concern to us, when
R is the slot ``Generalizations'' and S is the slot ``Specializations,''
this rule concludes that
it's aesthetic to define ``siblings'' (the
specializations of the generalizations of a concept) and
``inlaws'' (the generalizations of the specializations of a concept.)
Eventually, the former slot is seen to be much more useful than the latter,
which is discarded.


The new kind of slot-combiner
defined in the previous section
($\lambda\ (f,g)\ f↑*\circ\,g\,\circ\,f↑*$), 
was derived purely from symmetry (using a rule similar to this one),
and then it was noticed that some of the
slots which are most useful in the system (e.g., Isas, Examples) share that common
form. 


The definition of a new slot is often based
soundly upon need --- or the absence of need.  For example,

\noindent ``If, by monitoring usage, you notice
notice that many concepts have a large number of entries for their {\:c F} slot,
Then consider creating some new specializations of
{\:c F}.''

When applied, this leads Eurisko to create
several new slots, each having
fewer entries on the average than {\:c F} had, to cover the original {\:c F} slot.
E.g., after noting that the Examples slot was heavily used,
it would be plausible to consider creating new slots like Extreme-examples
and 
Typical-examples.  

A slightly more specialized heuristic says:

\noindent ``If you frequently perform `Remove x from f(x)', Then
it's plausible to define g as f -- $i$"  

Here, {\it i} is the
identity, so g(x) is simply f(x) with x removed.
This leads to the definition of
the proper ancestor slot (Genl$↑*$(c) -- c) from the simplerGenl slot,
to the definition
of the slot-combiner PLUS from STAR, and 
of many common, useful
mathematical functions, including
(although Eurisko doesn't actually
do this)
operations on matrices with main diagonals removed.

The above examples should adequately illustrate the manner in which
heuristic rules can generate timely new kinds of slots. A very important
point is that the rules are not specific to defining slots; they apply
equally well to defining new heuristics, new mathematical functions,
new lab procedures, and so on.
One of the bedrock philosophies of the Eurisko effort is that a single
large core of commonsense knowledge can  adequately guide discovery
processes operating on all these levels.

\NSECP{Conclusions}


If forced to survive in a changing world,
most inferential systems could benefit
from
{\it learning}
about their task environment and their own behavior.  For example,
they can exploit new schemata or different slot-names to simplify and
restructure their knowledge.  Self-modification would be cognitively
economical.

If forced to explore large
search spaces with some repetitive regularity, a
cognitive system could benefit by employing
{\it caching}
to save partial results.  We described a variety of techniques to
implement caching, and explained how caching specific results is
one of a spectrum of methods that can make trade-offs among precision,
speed, certainty, and generality.  

Since most environments contain much that is extraneous to the direct
solution of a task, a problem solver in such worlds can gain cognitive
economy by
{\it expectation-filtering} (see Appendix 5.1.)
In general, intelligent systems need to exploit their knowledge about
typicality to reduce their cognitive load and increase their attention
to important data.  In many situations, we believe that expectations
can both reduce processing requirements for handling ordinary events
as well as simplify the identification of surprising events.  

If the tasks are complex, and the resources alloted vary dramatically in magniude,
then
{\it multiple levels of abstraction} (MLA)
becomes  an {\it economical} representation (see Appendix 5.2.)  In many simple
environments, the advantages of MLA are minimal
and may not even justify their costs.  Thus, routine tasks may
warrant the kind of knowledge compilation that would convert an initial,
expressive MLA knowledge-base into a fast, integrated, and largely
uninterpretible code.  Conversely, as task complexity and variability
increase, MLA increasingly provides a basis for intelligent and rapid
restructuring.

	\SSEC{The Need for Further Research}

    In the years to come, AI programs will employ greatly expanded
knowledge bases and, as a consequence, they will explore increasingly
open-ended problem spaces.  Already, a few existing systems show signs
of having more potentially interesting things to do than they have
resources to pursue (e.g., AM, Eurisko).  In the past decades of
intelligent systems R&D, several design concepts have emerged in
response to contemporary needs for creating ever larger knowledge
bases.  For example, many researchers proposed multiple levels of
abstraction and automatic property inheritance as keystones of
efficiency or ``cognitive economy."  We believe that the value of such
mechanisms derives largely from their usefulness in describing initial
knowledge bases.  Once an intelligent system begins to explore the
consequences of its knowledge and to solve novel problems in a
dynamic environment, it needs to adapt its knowledge to achieve
faster and more profitable retrievals.

We have tried to show what cognitive economy
{\it is}
and
{\it is not.}
It does
{\it not}
consist of a set of static knowledge-base design principles, such
as those proposing taxonomic concept structures with automatic
property inheritance.  Rather, cognitive economy is a feature of
those intelligent systems that learn to solve problems efficiently
and consequently realize more of their lifetime potential.  
As knowledge
bases expand and basic software obstacles are overcome, AI systems
will increasingly address the same question facing intelligent
humans:  ``What would I most like to accomplish next, and how can I do that
economically?"

     A theory of cognitive economy
should explain 
under what conditions (and why)
knowledge needs to be adapted, and should prescribe systematically
how to do it.  In this paper, we have tried to
motivate and lay the groundwork for such a
theory.  
We presented Eurisko's automatic defining of plausible, useful new
slot types.
Many of the design heuristics we mentioned embody some gross features of
an eventual
theory and suggest numerous paths for future research.


\vskip 1xgpin 	% acknowledgements

{\bf Acknowledgements}

\vskip .6xgpin

\ninepoint

\parindent 19pt

We wish to thank Dave Barstow, John Seely Brown, Bruce Buchanan, Ed Feigenbaum, 
Cordell Green, Greg Harris, Barbara
Hayes-Roth, Allen Newell, Herbert Simon, 
and Don Waterman for several discussions which have left their
mark in this paper.

\setcount4 0	%  Reset for start of Appendices

\ASEC{Cognitive Economy in Existing Programs}

To date, most programs, including our own 
AI behemoths,  have {\it not} had to cope
with runtime environments that changed significantly.  In Equation 1, the
costs for monitoring and building in self-modification abilities have been
prohibitive, and the need for adaptiveness has been minimal.
The environments have been {\it complex} but not {\it changing}.  To be
``fit'' in such worlds, programs have had to employ sophisticated architectures,
knowledge bases, control strategies, representations, and expressive languages
in which they are specified; but programs have {\it not} needed to reorganize
themselves dynamically, to adaptively alter those sophisticated design
components.  Even in those situations in which
this flexibility must be required in the long run, it has been avoided, because 
(i) self-modification capabilities are tough to build in; many programs have
simple self-reprogramming as their sole task,
and 
(ii) monitoring is an expensive overhead in both space
(detailed record-keeping) and time (finding regularities in such histories).
This raises the issue of efficiency:

\ASSEC{Efficiency}

When we build a new, large AI program, 
we often find ourselves caught in the following
tradeoff: On the one hand, we want to develop our initial systems quickly
and painlessly, since they are experimental and ephemeral.  
We want to be able to modify them later with little or no redesign or 
reprogramming manually.
On the other hand,
we want our systems to run as efficiently as possible, to mi\-ni\-mize the
its demands for cpu time and primary and secondary memory.
We are torn between
{\it expressi\-bility} and {\it effi\-ciency}.


Many AI  researchers {\sl  cum} language  designers have  focused on  {\it 
expressibility},  the   problem   of  rapidly   constructing   a   working
experimental vehicle.\foo{This objective is apparent in the goals of 
{\:c EMYCIN} [Feigenbaum 1977], 
{\:c KLONE} [Brachman 1978],
{\:c KRL} [Bobrow and Winograd 1977],
{\:c PSI} [Green 1977], 
{\:c RITA} [Anderson and Gillogly 1976], 
{\:c ROSIE} [Waterman et al. 1979],
and
{\:c SAFE} [Balzer et al. 1977].}
They have produced some superlative software such  as
Interlisp's {\:c DWIM}, File, and Break  packages.
Four fundamental  techniques
utilized in highly {\it expressive} programs are:
({\it i}) reliance upon a very-high-level language,
({\it ii}) planning and reasoning at multiple levels of abstraction,
({\it iii}) inference by pattern-directed knowledge sources, and
({\it iv}) mi\-ni\-mal, nonredundant representation, as in a canonical
gene\-rali\-zation/spe\-ciali\-za\-tion hierarchy.

Both within the body of this paper (the Caching section) and within these
appendices (the Expectation-Filtering and Levels of Abstraction appendices)
we present various
techniques which do not sacrifice expressibility, yet enable
programs to (semi-)automatically improve themselves and thus  increase
their productivity.
One point of this paper is to show that the second goal of AI programming,
short-term runtime efficiency, is not sacrificed by these techniques.

The basic source of power is the ability to predict the way that the program will
be used in the future, and to tailor the program to expedite such uses.\foo{This
approach is sometimes used {\it manually}, as when a program is quickly coded
in an expressive but inefficient form, used heavily, and then recoded in a
much more optimized form.  Thus, e.g., Ray Carhart spent 1977 
rewriting DENDRAL (streamlining its CONGEN module)  for a minicomputer.}


The traditional approach to program optimization has assumed that static analysis of
programmer's expectations (whether implicit in his code or explicit in his
formal assertions) can identify the significant
optimization opportunities.  Three types of methods for analyzing program
descriptions in this way include:
({\it i}) analyzing 
program flow and structure [Knuth 1968] [Dijkstra 1976],
({\it ii}) designing data structures to be appropriate,
and ({\it iii}) compiling (as in the {\:c FORTRAN H} compiler) and optimizing
transformations of the program (as in 
[Darlington and Burstall 1973 1977] or [Balzer et al. 1977].)

In a changeable environment, 
we advocate the use of methods more {\it dynamic} than these.  Rather than
improving the static description of a program, we propose
that the program modify its own structure to adapt it to its
operational environment.\foo{Some
earlier automatic programming systems [Darlington and Burstall 1973]
[Lenat 1975] [Low 1974] [Kant 1977] [Barstow 1977]
were designed to improve programs' efficiency, and many of their
ideas have found their way into the techniques we now describe.  
Those systems 
had program construction, transformation, and optimization as their primary
task; we are proposing general methods to provide 
self-improving abilities
to other kinds of AI programs
(understanders, theory formers, expert simulators, planners, etc.).}

One valuable source of prediction comes from the program's experiences as it runs.
If a program can ``learn" from its experiences, it might try applying each piece of
newly acquired knowledge to the task of specializing itself, modifying itself
to run more efficiently in the current runtime environment.
We believe that a program's ``intelligence" can be increased in this way;
that is, by increasing its ability to
acquire  appropriate knowledge, to organize that knowledge, and to refine
the conditions under which that knowledge is recognized to be applicable.
For any fixed quanta of manpower ({\it sic}) resources we expend, there is a
limit to the size/complexity of programs which can be successfully implemented.
This barrier might be overcome by programs which are self-extending, which actively
do things to enlarge their input domain, their capabilities.

\hbox par 2.9in{    % this makes a space for the little figure; see below.
Notice that representing a corpus of knowledge as a  mi\-ni\-mal
(canonical)      gen\-erali\-za\-tion/spe\-ciali\-za\-tion hierarchy, with
interpretatively-com\-puted in\-he\-ri\-tance, may or may {\it  not}
be cognitively economical:
this technique
fa\-vors ex\-pres\-si\-bi\-lity,  compaction of  re\-pre\-sentation,
and efficiency in updating and altering the knowledge base, but
at  the expense  of
performance. It is  true that  a  dog is a mammal,  and a  mammal is  an
animal, and from those two we could compute that a dog is an animal (see
Fig. 9), but
if the same arc is being repeatedly recomputed then
 it is more cognitively economical to store one redundant  arc.
Studies by ourselves and others
[Hayes-Roth and Hayes-Roth 1974] [Rips et al. 1975] indicate
just such redundancies being created and strengthened by repetitions.
}

% [[in space above, insert DIAGRAM OF DOG/MAMMAL/ANIMAL,
% WITH REDUNDANT ARC AS A !STRONG! DOTTED ARROW]]

Obviously, the economy of specific representations and related inference
methods depends heavily on underlying machine architectures and costs.  We
assume that intelligent systems aim chiefly to produce useful results
(e.g., novel hypotheses) as rapidly as possible.  In short, they should
search cleverly and fast.  Thus, {\sl a priori} notions of aesthetics or design
economy have little bearing on cognitive economy.  Massive redundancies
in storage, processors, and inference processes seem more directly
relevant.  Furthermore, we suppose that, like people, intelligent software
should eliminate slow, interpretive searches for previously solved
problems (especially if the same problems are repeated often).
To this end, cognitive economy may be achieved
through dynamic compilation of action sequences.  Such compiling would
reduce the time-cost to effect the corresponding behavior; on the other
hand, this efficiency would appear to be gained only
at the cost of interpretability,
interruptibility, and modifiability.  Those are desirable attributes iff
a program potentially needs continual changing.  
In typical situations, both efficiency
of compiled forms and accessibility to declarative forms are intermittently
needed.  These different needs point again to the economic benefits of
maintaining simultaneously alternative and redundant representations.

\ASSEC{When It Pays to Adapt}


We can abstractly  characterize the situations in which an AI program
would find it cognitively economical to monitor and adapt itself: the resources
available for specific tasks vary significantly; or the specific tasks have (at
least locally) some similarities, some structure which enables them to be solved
quickly in the same particular way.  Here are some examples:

The load average of a time-sharing system fluctuates with time of day, but
users demand that their speech understanding system always
provides the best guess it can in real time.  

A hospital simulation program can run at excruciating levels of detail,
and it lives on a dedicated machine. But
there's been a bad accident nearby and within one minute the police must have
an estimate of how many emergency cases the hospital can accept.

A program is given an enormous amount of resources to prepare for each task,
but little resources between the time a task is presented and an answer is
called for.  It may be cost effective for the program to spend some of its
preparation (inter-task) time trying to {\it discover} new algorithms,
representations, etc. We are struck by this in nature (evolution), though
as yet just glimpse it in AI programs (such as Eurisko).

A system is sold to several customers, each with a distinct but complex
task environment in which it will live.  It then becomes cost effective to
spend some resources streamlining the program (either manually or
automatically) to fit that user.  
In terms of equation 1, the environment changes only once, so no continuous
monitoring and modification is performed; the onetime cost of the modification
can be amortized throughout the lifetime of the program (i.e., the penalty
for non-specialization is integrated over time.)
The well-known SYSGEN generation of
tailored operating systems is a simple example of this.

While maintaining a file system, the program notices the patterns in each user's
accesses to his files, and of the community as a whole, and occasionally
selects a new data structure for some part of the system to make it run more
efficiently. See [Low 1976].


\ASEC{The  Assumptions}

{\bf Summary:\  \sl The claims we make elsewhere in
this paper presuppose that ``intelligence" can be
adequately modelled as heuristic search, guided by
pattern-directed rules, implementable on a uniprocessor.}

\yskip

Every scientific theory is  constructed in a  rich context of  surrounding
theories, methods,  and standards  determining which experiments  are
reasonable ones to perform and which point of view to take when  interpreting
the results  --- in short, a {\it paradigm}. We feel it useful
to articulate the ``core" of our paradigm (the assumptions our  theory
rests upon) before  delving into more detail.
For that reason, this section is devoted to a presentation of our
assumptions, including:
a model for intelligence
(Appendix 2.1),  a  model for  how  intelligent computer  programs  may  be
organized (Appendix 2.2),  and a  model of the  characteristics of  present
computing engines (Appendix 2.3).  Briefly:

(i)  We  continually  face  searches  in  more  or  less  immense  spaces;
intelligence is the ability to bring {\it appropriate} knowledge to  bear,
to speed  up  such  searching.   Increasing  intelligence  then  comprises
increasing  knowledge, improving its  organization,  and refining the
conditions  for  its
applicability.  How are  these new discoveries  made? Empirical  induction
(generalizing from experimental and other observations), analogy, and  the
criticism  of  existing   theory  all   lead  to   new  conjectures,   new
possibilities.

(ii) Intelligent systems  can make  the applicability  of their  knowledge
explicit by representing that knowledge as condition-action rules ({\:cIF {\sl
situation}  THEN  {\sl   appropriate  reaction}}).  Such   pattern-directed
inference systems (PDIS) can benefit from a schema representation  (frame,
unit, being, script, etc.),  because this adds  structure which the  rule
descriptions can exploit.

(iii) Current computing technology presents us  with limited cycles, cheap  storage,
and expensive knowledge acquisition.



% \vfill\eject\SSEC{A Model of Intelligence}
\ASSEC{A Model of Intelligence}

{\bf Summary:\  \sl Since ``intelligence" depends upon bringing relevant
knowledge to bear, it can be increased not only by adding new knowledge,
but also by better organizing the existing knowledge.}

\yskip

Many human cognitive activities, including most of those commonly referred
to as ``requiring intelligence," can be cast as searches, as  explorations
through  a  search  space,  meandering  toward  some  goal.   We  call   a
problem-solver ``more  intelligent''  if  it can work efficiently toward  a
solution even in the  face of an immense  search space and an  ill-defined
goal.  Somehow, it is  imposing more structure on  the problem, and  using
that to search more effectively. How might it do this?
According to our model of human intelligence, it accomplishes the task  by
bringing knowledge  to  bear, knowledge  not  supplied explicitly  in  the
problem statement.  This knowledge can be about problem-solving in general
(e.g., how to recognize and break cultural blocks [Adams 1974])
 or about the  problem
domain specifically (e.g., which groups of chemical
atoms can usually be treated as aggregate
superatoms [Feigenbaum 1977]).

This implies  that a  problem  solver can  become  more effective  by  (i)
increasing its knowledge, (ii) better organizing its knowledge, and  (iii)
refining its criteria for  deciding when various  pieces of knowledge  are
applicable.  In terms of production  (If-Then) rules, these correspond  to
adding new rules, modifying  the data structure in  which rules are  held,
and tuning the conditions (If parts) of existing rules.

One very basic assumption we are making is that of {\it continuity}:
the problem-solving environment does not change its character too
radically, too rapidly.  That is, our environments are {\it fluid} but
not gaseous.  Let's see why this assumption is required.
If the program abstracts from its experiences a plausible heuristic H, it may be
able to analyze past scenarios to verify that possessing and obeying H
would definitely have improved its performance (e.g., led to the current
state of knowledge quicker);
but its only justification for believing that H will help it in the {\it
future} is the empirical evidence for the continuity of the environment.
Learning can be translated effectively into action only when the structure of
the environment has not altered too greatly.

A related assumption is that of {\it irreversible knowledge accretion}.
Our body of facts and observations continually grows and only occasionally
shrinks.  Certainly the abandonment of a whole system of thought is a
rare event indeed (see [Kuhn 1972]), 
whereas the adoption of new systems of thought is part of
the standard educational process.


\hbox par 2.9in{    % this makes a space for the little figure; see below.
New knowledge is discovered by 
specializing some broadly applicable but weak principles, or by
generalizing
particular experiences. The latter includes abstraction
(condense some experiences into
a  heuristic which
would have greatly aided us in the  past if only we'd had it) and
analogy (draw parallels  to  other
existing facts and heuristics,  and also to the  {\sl paths} which led  to
their discovery.)
One sometimes ``spirals in'' [Lakatos 1976]
toward precision and competence (see Fig. 10).}

%(hypothesize, criticise, and
%improve the hypothesis) [Lakatos 1976].


   \ASSEC{A Model of Intelligent Program Organization}

{\bf Summary:\  \sl 
Since we want our AI programs to access, reason about, and expand their own knowledge
bases, it is useful to represent such knowledge in a clean, modular form
(e.g., employing any one of the current schematized representations.)}

\yskip

The preceding remarks  about intelligent  problem  solving apply  equally  to
``hardware" and ``wetware" alike.  To be intelligent, computer programs
must  ultimately  grapple  with   the  tasks  of  knowledge   acquisition,
representation, and refinement. We cannot provide an absolute answer to how
they should be organized, but we  can posit some design constraints  which
have proven useful so far.

A very  general heuristic  in AI  programming is  the following:  {\sl If  your
program is  going  to  modify its  own  {\bf  X}, then  make  {\bf  X}  as
separable, clean, and explicit  as possible.} In our  case, {\bf X} can  be
instantiated as  ``knowledge," or  as  ``applicability conditions  for  each
piece of knowledge."  In this case, the heuristic advises us to represent our
knowledge in a separate,  clean, explicit form,  say as knowledge  modules
having some fixed internal 
structure, and also advises us to keep the applicability
conditions for  each  knowledge  module  separate from  the  rest  of  the
knowledge it contains.

This naturally leads us to  a pattern-directed inference system, in  which
knowledge is broken into  separate modules, each with  an explicit set  of
relevancy tests [Waterman and Hayes-Roth 1978].  Such systems arising in
Pittsburgh may champion
syntactic purity, while  those arising  in
California may tend toward the baroque,
but such variations  should not obscure  their common source  of
power.  The PDIS architect breaks his program's knowledge into a set  of
condition-action  production rules.

Having a clean  representation for  rules means having  a clean,  precise,
elegant language in which to express them.  By structuring the  conceptual
knowledge of  the system,  by partitioning  each module's  knowledge  into
several categories, a rule can  condense an entire cumbersome  description
into  a  simple reference (often a single pointer).  The  popular  schematized
representations  of
knowledge (scripts for episodes, frames for static situations, beings  for
specialists, units for  everything) enable a  rule like ``If  there are  no
known methods specific to finding new examples of prime numbers,  then..."
to have its condition coded as a simple null-test on the ``To-get" subslot of
the ``Examples" slot of the schema called ``Prime Numbers."  
By a judicious  choice
of slot  types and  subslot  types, the  system  builder can  reduce  most
triggering conditions  to  such  quick  checks on  the  state  of  various
(subslots of) slots of schemata.

Additional knowledge can be
brought to bear to locate efficiently all the rules which
{\it might} have their conditions satisfied in a given situation, and also
to decide which rules to execute (obey) among those whose IF  parts
have triggered (been satisfied).

   \ASSEC{A Model of (Present) Computers}


{\bf Summary:\  \sl 
Since we  are  considering the  problem  of building  computer  models  of
intelligent  behavior,   many   of   our   assumptions   deal   with   the
characteristics of present-day computers. They are the symbol manipulators
we use, but were not designed for that general purpose.}


\yskip

The foremost quality of computing machines, as we know them today, is their
limited ``inferential bandwidth."  Although they provide virtually {\sl (sic)}
limitless storage capacity, these machines can process only a small fraction
of this information in a reasonable amount of time.  Two primary reasons
for this disparity between storage and generation capacities include:
(1) knowledge acquisition proceeds slowly and expensively; and (2) inference
processes depend on operations that ordinarily require cycles of
uniprocessor systems.  

The first problem places a {\it cultural} limitation on the
development of intelligent machines.  This development is restrained by our
inability to express what we would like our machines to do or to convert
these expressions into effective machine interpretable codes.  

The second
problem reflects a {\it technological} constriction of our potential.  The
so-called von Neumann bottleneck, caused by machine architectures that
sequentially interpret program steps through a single logical processor,
strongly motivates concern for programs that can mini\-mize unnecessary
computations.  

In some future environment, perhaps neither of these problems
will persist.  In the meantime, however, we assume that two of the best ways
to increase AI system productivity, derived from these current limitations,
will be to expedite knowledge acquisition and to avoid unnecessary computing
(for instance, by opting to ``store & recall" rather than ``forget & 
recompute").


\ASEC{Another Example of GET}

Here is another example of how Eurisko's expanded 
{\:c GET}
 works:
we consider an access on the 
{\:c PARENTS}
 slot, when that slot is not primitive,
but rather is defined as the union of slots for father and mother.

\parskip 1pt

{\:c (GET MERLE PARENTS)}

{\:c ((GET PARENTS DEFN) MERLE)}

{\:c (((GET DEFN DEFN) PARENTS) MERLE)}

{\:c (($\lambda$ (s) ((GET (GET s SLOT-COMBINER) DEFN) (GET s BUILT-FROM))) PARENTS) MERLE)}

{\:c (((GET (GET PARENTS SLOT-COMBINER) DEFN) (GET PARENTS BUILT-FROM)) MERLE)}

{\:c (((GET UNION DEFN) FATHER MOTHER) MERLE)}

{\:c ((($\lambda$ (s1 s2) ($\lambda$ (c) (LIST (GET c s1) (GET c s2)))) FATHER MOTHER)  MERLE)}

{\:c (($\lambda$ (c) (LIST (GET c FATHER) (GET c MOTHER)))  MERLE)}

{\:c (LIST (GET MERLE FATHER) (GET MERLE MOTHER))}

{\:c (LIST SID BETTY)}

{\:c (SID BETTY)}



\parskip 20pt plus 3pt minus 1pt \lineskip 8pt

\yskip

\ASEC{Dynamically Monitoring the Task Environment}


{\bf Summary:\  \sl
The body of this paper has amply
illustrated how a program might profit from knowledge
it gathered dynamically. This suggests that
rather than working on the given problem exclusively, 
a program may be better off to expend some
time {\it learning} (about the specific problem, its broader domain, or
problem-solving in
general).  Note this suggestion would encompass traditional education (being
told), empirical research (observing),
and theoretical exploration (predicting).
While such ``very high-level" problem-solving strategies typify human
problem-solvers (especially those of us who have
rationalized spending twenty or more years ``in school''), very few programs to date
have employed any of these forms of learning to improve their operation.}

\yskip


We discussed in Section 1 the ``environmental fluidity"
conditions under which it could be cost effective for
an intelligent system to be able to improve its
performance dynamically.  To accomplish this, a program
must first be able to monitor its processing, i.e.,
to sense, record, and analyze
its representations, data structures, algorithms, inference procedures, and the
way they are currently being relied upon.
It can then use this gathered knowledge to modify or redesign itself to
perform more efficiently.


Some programs (and wish-lists for program capabilities) have included learning
more about the problem being worked on;\foo{ McCarthy's advice-taker and
Balzer's dialogue system (to get airline reservation programs written) would
be told, Gelernter's theorem-prover (employing diagrams as analogic
models) could observe, Lenat's AM
could observe and predict.}
in this paper, however, we are stressing learning about the
ways the program is being currently employed.

{\bf Learning by Being Told}

There are many situations in which a user could provide advice to a program:

\yskip

{\it (a). } The user might want to convey to the program
some task-specific item: ``Integration by
parts won't work on this problem';' ``This is a very hard
problem.''   This can easily be input as part of the user's statement of the
problem for the system to work on; e.g., the user might designate some methods
as being preferred or constrained, he might estimate a resource allotment, etc.

{\it (b). } Another type of advice
will apply permanently to the nature of the program's functioning
(e.g., ``Answers, even if uncertain, must be produced in real time;''
``Hard problems will be
starred.'')  Such knowledge can be reflected permanently in
the design of the program, in its code.

{\it (c). }  But there is an intermediate level of advice, some information
the user knows will concern
the next several tasks to be submitted --- not just the present one and not
all future ones:
``The next four problems will be supplied in order of
expected difficulty;" ``Several of these problems share common assumptions
and raise related issues;" ``The user for the next entire
session will be a physician;" ``Until I find out why F34 returned NIL, I'll
be in debugging mode, not production mode."  Such advice is rarely communicable
even to AI programs.

We believe this type of intermediate-level
advice would improve
programs if they had the opportunity to accept and use it.
In general, we would like to advise the program which capabilities to
consider and which to ignore during a prolonged period of solving closely
related problems.
A fixed (static) approach, for example, would be to define a few modes
(e.g., Debug, Demo, Production, Input,...) and permanently build into the code
all changes required to specialize the system for each mode. A flexible
(dynamic) approach would
provide a larger language in which the user could describe his current
mode of usage of the program (e.g., by typing ``{\:cUSAGE of KNOWLEDGE-BASE
will-be INCREASED VERY-MUCH}'').
The program would translate such statements into
changes in its data structures, algorithms, explanations, core management
strategies, etc. In the current example, heavy usage of the knowledge
base might
cause the program to swap out its compiler, optimizers, and other code,
enabling it to hold many more knowledge chunks at a time in core.
The ability to hold more knowledge in memory would directly affect the rate
and probability of productive inferences.\foo{Similar effects have been shown
for humans: [B. Hayes-Roth 1979 {\it Cognitive Science}, in press]}

\yskip

{\bf Learning by Observing}

Spending a little time/space reconfiguring to reflect the current state
of the world (the nature of the recent and succeeding inputs) seems
worthwhile; the problem is how to know what that abnormal nature of the
environment is. In the last subsection, we discussed the case where the
user explicitly tells the program. But this is often impossible (as when
the
user is unaware of an impending irregularity, or when the ``user'' is
Nature), and almost
always a nuisance (learning the language in which to express such advice,
remembering to do it, taking the time to do it). 

The next step seems to be to allow the program to {\it infer} such facts
about its run-time environment from observations.
What kinds of self-monitoring can be usefully employed?

The method which comes to mind first, perhaps, is to save a complete
record of everything that ever happened to this program: all the inputs
it received (when, from whom), function calls and accesses it made internally,
paths it chose/rejected (and why),
and outputs it generated.  The program could in principle
induce, from such an exhaustive
history-list, patterns such as ``very light usage of
compiler;'' ``SQRT is always called
on the result of SUM-OF-SQUARES.''

The space costs of such an archive, and the temporal cost of the
procedures
to infer useful patterns, are surely prohibitive.
The more economical approach is to analyze in advance  (and change slowly
throughout the program's lifetime [Samuel 63]) certain
features we would wish to look for in a complete history record, and then
to have the program record just enough dynamically to gather and maintain
those features. Some of these will be domain-independent and
commonly useful:
Frequency of calls on various functions, requests for subtasks, etc.;
Patterns in the arguments to calls on certain functions and inputs to the program;
Patterns in the sequence of calls on functions;
Patterns in functions' results (e.g., F34 always seems to
return a User-Name).
Some of the features will be domain-dependent specializations of the
preceding features (``Average degree of input equation'' is a special
``pattern in the input.'')


{\bf Learning by Predicting}

The theoretician and the engineer share
a powerful technique: manipulating
a model of a system to generate predictions and identify those possible
outcomes that would be desirable.
This is a third route to learning.
How might programs produce it?
What sorts of theories, or models, can be employed to generate new
information?

{\it Models of the program's users} can be valuable. This includes
maintaining and extending profiles of individual users and whole
groups. When a user opens a dialogue with Eurisko with the word
``Let", it is a safe bet that he's a mathematician.  Later, when he
uses the word ``function," his membership in that group makes it easy
to determine which meaning he has in mind
(instead of, say, what a physiologist or a computer scientist or sex therapist
might mean
by ``function.'')
When the program wishes to
mention $\sqrt{-1}$, it uses its 
{\:c MATHEMATICIAN}
 model to
choose an appropriate notation (``{\sl i}'' for mathematicians
instead of, say, ``{\sl j}'' for
engineers).  Models can be arranged in a hierarchy, with inheritance
of features from more general groups to their subsets, a lattice
ultimately terminating in models of individuals.  User inputs would
be recognized by various models, which would then hypothesize membership
in that group (or explicate some future conditions which would confirm or
deny such inclusion).  Once triggered in this way, the user's apparent groups
(and its generalizations) would actively
filter his inputs and outputs, could modulate the program's actions (e.g., by
changing the worth of the ``Proof" concept, which in turn would change
the priority rating of tasks on an AM-like agenda and thus alter
the program's choice of what to work on next.)
New user models can be created either top-down (e.g., when a model has
many diverse instances or when it's proven historically to be very valuable,
then try to split
it into several specializations, each having a share of the general model's
instances) or bottom-up (i.e., notice that a group of individuals share a
set of common characteristics, and create a new user group out of them).
At our suggestion, a student at CMU has built a program embodying some of
these ideas [Rich 1979].

The preceding kind of prediction is based on a theory of user types, embodied
in a set of ``typical representative'' models.  The same source of power can
be used in many ways: stereotyping of events (as in scripts), stereotyping
of static complexes (as in frames), stereotyping of communities (as in
actors, beings, and contract nets).  In all these cases, the source of power
lies in quickly recognizing probable membership in (applicability of) some
schema, after which all the rest of the schema
(plus knowledge about all its generalizations) is presumed to be relevant.
Programs might build up and utilize scenarios of how problems are solved
typically in their domain.  The building-up could be top-down via specialization
or bottom-up via generalization, or even ``sideways" via analogy. The 
models would be used for prediction (recognize relevant models and activate
them, have them hypothesize probable future situations).
Together, the models would form a theory of how problems are
solved in that domain.


\ASEC{Cognitive Economy in ANY Environment}

{\bf Summary:\  \sl 
A coherent theory of
cognitive economy must characterize the sources of power of its
methods, and describe ways in which they specialize for environments
having various properties.
While the body of this paper concentrated upon cognitive economy in {\it fluid}
environments, we hope to see such a comprehensive theory develop
in the future. For that reason, we shall present (briefly, here in an
appendix)
a few techniques
for enhancing cognitive economy in {\it arbitrary} environments.  
We have already described
in great detail (Sec. 2.2) the
{\it caching} of intermediate results (intelligent redundancy)
and will now treat two additional methods: expectation-simplified
processing (intelligent focus of attention) and multiple levels of
abstraction (intelligent knowledge structuring.)
}

	\ASSEC{Expectation-Simplified Processing}

{\bf Summary:\  \sl 
For efficiency's sake, an intelligent system should be willing and
able to add new facts, but should be {\it eager} to add surprising new facts.
Surprises can only be noticed by contrast with {\it expectations}, so an intelligent
system should maintain a context of expectations and filter incoming observations
against that.  Furthermore, expectations and surprises can aid an intelligent
system in comparing its model and processing of the domain to the real world.
Through such monitoring, discrepencies may be found and diagnosed, leading to
changes in the model making it more consistent with observed behavior.  Our
discussion here centers on the importance of using such expectations
to focus and filter intelligent processing.}

\yskip

The world bombards our senses with data, much more than we can process
in detail in real time; yet we can't live in a hundred times real time.
We survive by ignoring most of that data, or, more precisely, by knowing
(almost  immediately deciding) 
what can be ignored.  We maintain a set of expectations, against
which the incoming torrent is matched. Almost all of it will match those
expectations, and we then need merely process the unexpected inputs, the
surprising observations.  We reserve our computing for those opportunities
which promise us genuine new facts, rather than reconfirmation of known ones.

Such concentration on the abnormal is a natural side-effect of being guided
by expectations --- albeit usually stereotyped ones --- about situations,
events, and people.
One is able to zip through most of what is encountered, match it immediately
to the active schema, and move on. The only processing which is slow and
tricky is that dealing with the remaining anomalies, exceptions, puzzles.

The more general position is that each slot indicates not merely a default,
but also an indication of how important it is to confirm that value.
Some  slots  will  have  meta-comments  like  ``I'm  worth  computing   or
verifying'', or just  the opposite.  For example, in our  ``Room'' 
frame,  we have  a
default Ceiling, but in everyday life
we know to not squander time checking that each  room
we're in has a  ceiling.  Similarly for emergency  exit signs: 
they usually exist, and we rarely feel it's cost-effective to confirm that.
We have  in
both sittuations a very fast  algorithm we can apply if  ever we are in  a
situation in which we need that information.  For  ceilings, it's
$\lambda\ (\ )\ (look upwards)$; for  exit
signs, it's ``look for red on white lettering.''\foo{
While this may seem to be a much more complex program to execute, recall that
if we truly need to run it --- say in a sudden power failure ---
we expect those lit signs to be one of the few things we could still see.}

The policy for whether, when, and how to check a default slot value can be
arbitrarily complex. The program may be designed to
reason at great length about how much time to expend re-confirming the expected.
It may reason about the task of reasoning about how much time to spend,
and so on.

\yskip

\noindent Several 
increases in efficiency are realizable through expectation-filtering:

(a). Perceptual set: Expectations allow you to see what you expect with less effort
For emergency exit signs, when we need to find them, we know to look for horizontal
and vertical red lines; this is 99$\%$ effective. When we look at our red LED watch
and have a perceptual set to read the time, we of course would not mistake the
image for that of the EXIT.

(b). Surprise:  If we observe Y when we expected X, our attention is captured.
In one way, sensitivity to data inconsistent with
expectations has been heightened.  We saw in (a), above, that reliance on
frames will dull our chance of noticing a surprising datum
(since we will tend to interpret any data as consistent
with the current frame).
This point (b)
says that {\it if} we do notice it, then it will have a significant impact.
Carefully buiding up an expectation and then abrupting shattering it with a
jarring surprise is the archetypical template for a good joke. 
As Arthur
Koestler observes, the jarring of conflicting matrices of thought is what
powers art ({\sl ah!}), science ({\sl aha!}) and humor ({\sl haha!}).

(c). Defer: By relying upon expectations, upon defaults, we can usually defer
processing to find the true value in the current case forever.  At least, when
we {\it do} finally have to make a particular perception, it may at that time
be easier to make. When the ceiling is falling, it's easier to see; when all the
lights go out except for the EXIT signs, it's easier to see them.
Generalizing, we may state that it's worth a little computing to reason
out whether (and maybe even how long) a slot-filling
can be ignored.

(d). Predict and Prepare:  This includes the whole continuum we discussed in
the Caching section: bind a variable, store the triple (F,x,F(x)), store enough
information so that F(x) could be recomputed very quickly, store information
which would make F'(x') more easily computable (for x' similar to x, and F'
similar to F), store away a partial model of the world with good and bad features
tagged.  Notice that moving along this continuum, programs grow
more flexible, more descriptive (less scalar), more dynamic.
In all cases, the
motivation for doing this storage is to (i) speed up the recognition of
(the need to do) similar computations in the future, and (ii) speed up
or even eliminate those computations when you decide they {\it are} relevant
and you want the values they would return.  These are the two uses of
scripts, but also of cached constants (e.g., caching the value returned
by OWNED-BY, a function from CARS to PEOPLE). A better example is {\it user models}:
they range from a single bit (E.g., a  NOVICE/PRO toggle), 
to a few numbers (e.g., in PARRY),
to a dynamic concept-modelling scheme (e.g., in Eurisko).

In the interests of cognitive economy, we conclude that programs should
build up expectations, monitor them,
and change their models as needed.  A natural mechanism for doing this is
the use of pattern directed knowledge modules (PDMs), which can react to
changing situations, to sudden disconfirmations of predictions.
These can be used as triggers or demons; e.g., if
you hear someone shout ``Fire!" in a theater, that takes precedence over
Jane Fonda.
The second use of PDMs arises when an expectation is not met: the program then
also  has some chance of  modifying the culprit
rule(s) that made that false prediction (and/or
adding new rule(s) which would make the right one.)

%[[Mention TERIESIAS??]]	
%
%[[ignore:	This ties learning in very closely to scientific theory
%			formation (at least the theories of knowledge which
%			are activist, revolutionary conventionalist activist,
%			methodological falsificationist, sophisticated.
%			[those are the nodes descending in a genl/spec hier
%			of theories of knowledge]
%
%]]

	\ASSEC{Levels of Abstraction}

{\bf Summary:\  \sl 
An intelligent system's domain model is typically an approximation of reality.
It is often the case that knowledge exists at several levels of detail and
abstraction.  The amount of detail required in intelligent processing depends
on the problem-solving context.  By appropriately structuring knowledge, a
problem-solver can take advantage of abstraction by reasoning at the level
appropriate  to the given situation.  We argue here that abstraction
is crucial for intelligent systems which must process    over large
knowledge bases or within widely-varying time constraints.}

\yskip

%[[mention McCarthy's common-sense reasoning example: a spilling coffee cup;
%we don't try to integrate the hydrodynamic differntial equations]]


In many real-life problem solving situations, the ``goal" is not a single,
precise answer. Rather, there is a cost-benefit tradeoff between the
accuracy of the solution and the amount of resources consumed in producing
it.  
Computer programs which are designed only to return exact answers
lie at one extreme end of this curve, and rarely near the optimal point (of
diminishing returns).

This presupposes a kind of ``continuity'' in the
environment, as we discussed in Appendix 2.1. Namely, truncated search is
assumed to be better than random selection of a legal move. 
Almost paradoxically, if the environment
is mercurial, then the optimal point on the tradeoff will cause our program to
be very {\it expressive}. 

Cognitive systems which must cope with the world have an omnipresent need to
approximate reality --- a need which is well met by maintaining multiple
representations of the world  at various levels of abstraction;
i.e., an entry in the knowledge base has all its generalizations also
stored explicitly, redundantly. 

In addition to aiding us in quickly approximating reality,\foo{
In Section 1, we described a situatin in which a hospital simulation
program might be required upon occasion to provide as good an estimate as
possible to a query within a given (short) time period. }
multiple levels
of abstraction (``MLA") are useful for
{\it analogizing}: they can 
make that process easier, faster, and occasionally even
more valid. Analogies can be recognized when two distinct concepts coincide
at some abstract level.\foo{Conversely, if (corresponding
slots of) two apparently related
concepts (such as two homonyms) do not share related structure at higher
levels of abstraction, that suggests that an analogy between them would be
superficial and lacking in power.  }


Multiple levels of abstraction cost a certain
amount to implement, both initially (man-hours of programming) and as the
program runs (cpu time spent maintaining the redundancy, choosing which level
to work on next, etc.) For problems of all sizes, this feature aids in the
expression of new problems to the program, since they may be input at whatever
level(s) seem appropriate to the user. Thus multiple levels of abstraction
will always add to expressiveness, and will either detract or add to
efficiency depending upon how small or large the task is.
MLA is very useful when the program's resources are
curtailed or, even better, varied dramatically in magnitude from run to run
(a program which was {\it always} limited to a 15-second execution time would
be designed specifically for that task, but a program which might get anywhere
from 5 seconds to 5 hours for the same simulation question could better
benefit from multiple levels of abstraction.)
MLA
contributes more to efficiency
as the quotient (task size)/(resource allotment) increases in size.


There is a more general result here: the methods which humans find useful for
easy expression of problems may be useful to programs attacking very large
problems.  For instance, consider the decision about whether to do inference
by a system of pattern-invoked experts, or rather to preprocess the triggering
conditions into a big discrimination network. The former approach is more
expressive, the latter more efficient --- at least for small problems. For
very large tasks, the need to remain flexible grows very large. It is then
more economical to retain pattern-directed control (the compilation of the
patterns is so complex the whole procedure would have to be redone
frequently during the course of execution, if we opted for that alternative
instead.)  A case in point is the HARPY speech understanding program 
[Lowerre & Reddy 1979]:
whenever the grammar changes --- at any level --- the entire speech recognition
network must be recompiled. HEARSAY II [Lesser and Erman 1977],
while slower in performance, has
sufficient flexibility (from several independent pattern-directed knowledge
sources) to obviate such monolithic integrative recompilation delays.

\vfill\eject
\NNSECP{References}

\parskip 10pt plus 1pt minus 1pt \lineskip 1pt

Adams, J. L.,
{\it Conceptual Blockbusting},
W.H. Freeman: San Francisco, 1974.

Anderson, R. H. and J. J. Gillogly,  ``Rand intelligent terminal agent
(RITA):  design philosophy.''  R-1809-ARPA, The Rand Corporation,
Santa Monica, Calif., 1976.

Balzer, R., N. Goldman, and D. Wile, ``Informality in program
specifications,''
{\it Proc. of the 5th Int. Joint Conf. on Artificial Intelligence},
Cambridge, Mass., 1977, 389-397.

Barstow, D., ``A knowledge-based system for automatic program
construction,''
{\it Proc. of the 5th Int. Joint Conf. on Artificial Intelligence},
Cambridge, Mass., 1977, 382-388.

Berliner, H. J., ``Chess as problem solving: the developments of a tactics
analyzer,'' Ph.D. Dissertation, Carnegie-Mellon University, Pittsburgh, 1974.

Bobrow, D. G. and B. Raphael, ``New programming languages for
artificial intelligence research,''
{\it Computing Surveys}, {\bf 6},
1974, 153-174.

Bobrow, D. G. and T. Winograd, `` An overview of KRL, a knowledge
representation language,''
{\it Cognitive Science}, {\bf 1},
1977, 3-46.

Brachman, R., ``Theoretical studies in natural language
understanding'', BBN Report 3888, September, 1978.

Burstall, R. M. and J. Darlington, ``A transformation system for
developing recursive programs,''
{\it J. ACM}, {\bf 24},
1977, 44-67.

Cohen, H., ``What is an Image?'', {\it Proc. Sixth International Joint
Conference on Artificial Intelligence,} Tokyo, August, 1979,  1028-51.

Collins, A. M. and M. R. Quillian, ``Retrieval time from semantic memory,''
{\it J. Verbal Learning and Verbal Behavior}, {\bf 8},
1969, 240-247.

Collins, A. M. and M. R. Quillian, ``How to make a language user,'' in
(E. Tulving and D. Donaldson, eds.)
{\it Organization of Memory},
Academic Press: New York, 1972.

Conrad, C.  ``Cognitive economy in semantic memory,''
{\it J. Experimental Psychology}, {\bf 92},
1972, 149-154.

Darlington, J. and R. M. Burstall, ``A system which automatically improves
programs,''
{\it Proc. of the 3th Int. Joint Conf. on Artificial Intelligence},
Stanford, Calif., 1973, 479-485.

Dijkstra, E. W.,
{\it A Discipline of Programming},
Prentice-Hall: Englewood Clifs, N.J., 1976.

Feigenbaum, E. A.  ``The art of artificial intelligence:  themes and
case studies of knowledge engineering,''
{\it Proc. of the 5th Int. Joint Conf. on Artificial Intelligence},
Cambridge, Mass., 1977, 1014-1029.

Fikes, R. E., P. E. Hart, and N. J. Nilsson, ``Learning and executing
generalized robot plans,''
{J. Artificial Intelligence}, {\bf 3},
1972, 251-288.

Fiksel, J. R. and G. H. Bower, ``Question answering by a semantic
network of parallel automata,''
{\it J. Math. Psych.}, {\bf 13},
1976, 1-45.

Gelernter, H., ``Realization of a geometry-theorem proving machine,'' in
(E. A. Feigenbaum and Julian Feldman, eds.),
{\it Computers and Thought},
McGraw-Hill: New York, 1963, 134-152.

Green, C., ``A summary of the PSI program synthesis system,''
{\it Proc. of the 5th Int. Joint Conf. on Artificial Intelligence},
Cambridge, Mass., 1977, 380-381.

Green, C. and D. Barstow,  ``On program synthesis knowledge,''
{\it J.  Artificial Intelligence}, {\bf 10},
1978, 241-279.

Hayes-Roth, B. and F. Hayes-Roth,  ``Plasticity in memorial networks,''
{\it J. Verbal Learning and Verbal Behavior}, {\bf 14},
1975, 506-522.

Hayes-Roth, B. and F. Hayes-Roth,   ``The prominence of lexical information
in memory representations of meaning,'' {\it J. Verbal Learning
and Verbal Behavior}, {\bf 16}, 1977, 119-136.

Hayes-Roth, B. and C. Walker,   ``Configural effects in human memory,''
{\it Cog\-ni\-tive Sci\-ence},
in press, 1979.

Hayes-Roth, F., P. Klahr, J.  Burge, and D. J. Mostow, ``Machine
methods for acquiring, learning, and applying knowledge,''.  P-6241,
The Rand Corporation, Santa Monica, Calif., 1978.

Hayes-Roth, F. and V. R. Lesser,   ``Focus of attention in the Hearsay-II
speech understanding system,''
{\it Proc. of the 5th Int. Joint Conf. on Artificial Intelligence},
Cambridge, Mass., 1977, 27-35.

Kahn, K., ``Making Aesthetic Choices'', {\it Proc. 6IJCAI,}
Tokyo, 1979,  448-50.

Kant, E., ``The selection of efficient implementations for a high level
language,''
{\it Proc. ACM SIGART-SIGPLAN Symp. on Artificial Intelligence and Programming
Languages},
SIGART Newsletter 64, 1977, 140-146.

Klahr, P., ``Planning techniques for rule selection in deductive
ques\-tion-ans\-wer\-ing,'' in (D. A. Waterman and F. Hayes-Roth, eds.),
{\it Pattern-Directed Inference Systems},
Academic Press: New York, 1978.

Knuth, D., {\it Fundamental Algorithms
(The Art of Computer Programming, Volume 1,)} Addison Wesley, 1968.

Kuhn, T., {\it The Structure of Scientific Reovolutions,} 1972.

Lakatos, I., {\it Proofs and Refutations},
Cambridge University Press: Cambridge, 1976.

Lenat, D. B., ``Beings:  knowledge as interacting experts,''
{\it Proc. of the 4th Int. Joint Conf. on Artificial Intelligence},
Tbilisi, USSR, 1975, 126-133.

Lenat, D. B., ``AM: an artificial intelligence approach to discovery in
mathematics as heuristic search,''  Memo AIM-286, Stanford AI Lab, 1976.

Lesser, V. R. and L. D. Erman, ``A retrospective view of the Hearsay-II
architecture,''
{\it Proc. of the 5th Int. Joint Conf. on Artificial Intelligence},
Cambridge, Mass., 1977, 790-800.

Low, J., ``Automatic coding:  choice of data structures,''  Memo AIM-242,
Stanford University, 1974.

Lowerre, B. and D. R. Reddy, ``The Harpy system,'' in (W. Lea, ed.),
{\it Recent Trends in Speech Recognition},
Prentice-Hall: Englewood Cliffs, N. J., in press.

McCarthy, J., ``The advice taker,'' in (M. Minsky, ed.),
{\it Semantic Information Processing,}
MIT Press: Cambridge, Mass., 1968.

Mostow, J. and F. Hayes-Roth,  ``Machine-aided heuristic programming: a
paradigm for knowledge engineering,'' N-1007-NSF, The Rand Corporation,
Santa Monica, Calif., 1979.

Newell, A., J. C. Shaw, and H. A. Simon,  ``Chess-playing programs
and the problem of complexity,'' in (Feigenbaum and Feldman, eds.)
{\it Computers and Thought},
McGraw-Hill: New York, 1963, 39-70.

Newell, A. and H. A. Simon, 
{\it Human Problem Solving},
Prentice-Hall: Englewood Cliffs, N.J., 1972.

Nilsson, N., {\it Problem Solving Methods in Artificial Intelligence}, 1972.

Rich, E., ``Building and Exploiting User Models'', {\it Proc. 6IJCAI,}
Tokyo, 1979,  720.

Rips, L. J., E. J. Shobin, and E. E. Smith,   ``Semantic distance and
the verification of semantic relations,''
{\it J. Verbal Learning and Verbal Be\-ha\-vi\-or}, {\bf 12},
1973, 1-20.

Samuel, A., ``Some Studies on Machine Learnin of Checkers,'' in
(E. A. Feigenbaum and Julian Feldman, eds.),
{\it Computers and Thought},
McGraw-Hill: New York, 1963.

Sandewall, E., private communication, August 22, 1979.

Stefik, M.  ``An examination of a frame-structured representation system,''
Memo HPP-78-13, Stanford University, 1978.

Waterman, D. A., R. H. Anderson, F. Hayes-Roth, P. Klahr, G. R. Martins, and S.
Rosenschein, ``Design of a rule-oriented system for implementing
expertise,''  The Rand Corporation, Santa Monica, 1979.

Waterman, D. A. and F. Hayes-Roth, 
{\it Pattern-Directed Inference Systems},
Academic Press: New York, 1978.


\vfill\end